Goto

Collaborating Authors

 causal view


A causal view of compositional zero-shot recognition

Neural Information Processing Systems

People easily recognize new visual categories that are new combinations of known components. This compositional generalization capacity is critical for learning in real-world domains like vision and language because the long tail of new combinations dominates the distribution. Unfortunately, learning systems struggle with compositional generalization because they often build on features that are correlated with class labels even if they are not essential for the class. This leads to consistent misclassification of samples from a new distribution, like new combinations of known components. Here we describe an approach for compositional generalization that builds on causal ideas. First, we describe compositional zero-shot learning from a causal perspective, and propose to view zero-shot inference as finding which intervention caused the image?. Second, we present a causal-inspired embedding model that learns disentangled representations of elementary components of visual objects from correlated (confounded) training data. We evaluate this approach on two datasets for predicting new combinations of attribute-object pairs: A well-controlled synthesized images dataset and a real world dataset which consists of fine-grained types of shoes. We show improvements compared to strong baselines.


A Causal View on Robustness of Neural Networks

Neural Information Processing Systems

We present a causal view on the robustness of neural networks against input manipulations, which applies not only to traditional classification tasks but also to general measurement data. Based on this view, we design a deep causal manipulation augmented model (deep CAMA) which explicitly models possible manipulations on certain causes leading to changes in the observed effect. We further develop data augmentation and test-time fine-tuning methods to improve deep CAMA's robustness. When compared with discriminative deep neural networks, our proposed model shows superior robustness against unseen manipulations. As a by-product, our model achieves disentangled representation which separates the representation of manipulations from those of other latent causes.


Review for NeurIPS paper: A causal view of compositional zero-shot recognition

Neural Information Processing Systems

Weaknesses: * This method is most suitable for variables that have a single parent in the causal DAG -- the class label. This severely restricts the class of attributes that can be modeled and manifests in the paper as experiments with simple attributes (colors in AO-CLEVr, and materials in Zappos). In fact, prior work has noted that attributes (or other compositional modifiers) manifest very differently for different objects ([36] gives the examples from prior work: "fluffy" for towels vs. dogs, "ripe" for one fruit vs. another etc.). For these attributes, and many others, the data generating process is not so straightforward -- there are edges from both attribute labels and object labels to the core features. The authors do acknowledge this limitation in L326, however it is an important weakness to consider given that _difficult_ instances in real world datasets (where both object and attribute are parents of \phi_a for example) are fairly prevalent.


Review for NeurIPS paper: A causal view of compositional zero-shot recognition

Neural Information Processing Systems

All four reviewers appreciated the neat idea contained in this paper which is also shown to work well in practice. The authors open up the way for studying data generation processes through causal interventions, which is a novel and technically interesting direction. Most importantly, it is a significant direction which is expected to stimulate further research in the field. I am recommending acceptance of this paper, however please consider revising the manuscript to address R4's remarks about clarity and R2's and R3's remarks about deeper discussion of failure cases and limitations.


Review for NeurIPS paper: A Causal View on Robustness of Neural Networks

Neural Information Processing Systems

Additional Feedback: Given fundamental limits of network robustness to adversarial attacks (see "Limitations of Adversarial Robustness: Strong No Free Lunch Theorem"), where does the proposed method differ, or relate to that general framework for robustness / adversaries? Does the causality framework provide a "way out" from the bounds and limits shown in that work? The lack of robustness to horizontal and vertical shift in the MNIST example seem as coupled to the architectural bias of the particular discriminator design, as to the task itself - for example an object detection framework such as RCNN or modern variants (ala Mask-RCNN) should have little issue with the shifted image task described in the paper. How can we separate the issue of network design (which is frequently driven by known invariances in the desired domain - such as moving from simple DNNs to more applicable CNNs) and the causal manipulation model (which also has design parameters and potential pitfalls, as discussed in 3.2 and 4.2). If using some kind of automated network design setting (such as meta-learning or evolutionary approaches) would both the CAMA model design, and the discriminator itself need to be designed in conjunction, or some kind of back-and-forth iteration?


Review for NeurIPS paper: A Causal View on Robustness of Neural Networks

Neural Information Processing Systems

On one hand, the manuscript is well-organized and reviewers appreciated the probabilistic attempt at robustness and the fine-tuning idea. On the other hand, concerns were voiced in the reviews and during discussion. In the end, the meta-reviewer (after independent examination of the manuscript) concluded that the merits outweigh the potential issues. We strongly encourage the authors to revise the draft by taking the following comments into account (more in the reviews): (a) The causal aspect of the manuscript appears to be somewhat decorative than necessary. Indeed, upon independent reading of the manuscript, the meta-reviewer agrees that one can essentially remove all causal notions, after all the do-calculus on the simple model that the authors adopted is nothing different from the usual conditioning. Besides, the causal model is never used for true intervention or performing counterfactual inference.


A causal view of compositional zero-shot recognition

Neural Information Processing Systems

People easily recognize new visual categories that are new combinations of known components. This compositional generalization capacity is critical for learning in real-world domains like vision and language because the long tail of new combinations dominates the distribution. Unfortunately, learning systems struggle with compositional generalization because they often build on features that are correlated with class labels even if they are not "essential" for the class. This leads to consistent misclassification of samples from a new distribution, like new combinations of known components. Here we describe an approach for compositional generalization that builds on causal ideas.


A Causal View on Robustness of Neural Networks

Neural Information Processing Systems

We present a causal view on the robustness of neural networks against input manipulations, which applies not only to traditional classification tasks but also to general measurement data. Based on this view, we design a deep causal manipulation augmented model (deep CAMA) which explicitly models possible manipulations on certain causes leading to changes in the observed effect. We further develop data augmentation and test-time fine-tuning methods to improve deep CAMA's robustness. When compared with discriminative deep neural networks, our proposed model shows superior robustness against unseen manipulations. As a by-product, our model achieves disentangled representation which separates the representation of manipulations from those of other latent causes.


A Survey of Out-of-distribution Generalization for Graph Machine Learning from a Causal View

Ma, Jing

arXiv.org Artificial Intelligence

Graph machine learning (GML) has been successfully applied across a wide range of tasks. Nonetheless, GML faces significant challenges in generalizing over out-of-distribution (OOD) data, which raises concerns about its wider applicability. Recent advancements have underscored the crucial role of causality-driven approaches in overcoming these generalization challenges. Distinct from traditional GML methods that primarily rely on statistical dependencies, causality-focused strategies delve into the underlying causal mechanisms of data generation and model prediction, thus significantly improving the generalization of GML across different environments. This paper offers a thorough review of recent progress in causality-involved GML generalization. We elucidate the fundamental concepts of employing causality to enhance graph model generalization and categorize the various approaches, providing detailed descriptions of their methodologies and the connections among them. Furthermore, we explore the incorporation of causality in other related important areas of trustworthy GML, such as explanation, fairness, and robustness. Concluding with a discussion on potential future research directions, this review seeks to articulate the continuing development and future potential of causality in enhancing the trustworthiness of graph machine learning.


A Causal View on Robustness of Neural Networks

Zhang, Cheng, Zhang, Kun, Li, Yingzhen

arXiv.org Machine Learning

We present a causal view on the robustness of neural networks against input manipulations, which applies not only to traditional classification tasks but also to general measurement data. Based on this view, we design a deep causal manipulation augmented model (deep CAMA) which explicitly models possible manipulations on certain causes leading to changes in the observed effect. We further develop data augmentation and test-time fine-tuning methods to improve deep CAMA's robustness. When compared with discriminative deep neural networks, our proposed model shows superior robustness against unseen manipulations. As a by-product, our model achieves disentangled representation which separates the representation of manipulations from those of other latent causes.